Learning Nash Equilibrium for General-Sum Markov Games from Batch Data
نویسندگان
چکیده
This paper addresses the problem of learning a Nash equilibrium in γ-discounted multiplayer general-sum Markov Games (MGs) in a batch setting. As the number of players increases in MG, the agents may either collaborate or team apart to increase their final rewards. One solution to address this problem is to look for a Nash equilibrium. Although, several techniques were found for the subcase of two-player zero-sum MGs, those techniques fail to find a Nash equilibrium in general-sum Markov Games. In this paper, we introduce a new definition of ǫ-Nash equilibrium in MGs which grasps the strategy’s quality for multiplayer games. We prove that minimizing the norm of two Bellmanlike residuals implies to learn such an ǫ-Nash equilibrium. Then, we show that minimizing an empirical estimate of the Lp norm of these Bellman-like residuals allows learning for general-sum games within the batch setting. Finally, we introduce a neural network architecture that successfully learns a Nash equilibrium in generic multiplayer generalsum turn-based MGs.
منابع مشابه
Learning in Markov Games with Incomplete Information
The Markov game (also called stochastic game (Filar & Vrieze 1997)) has been adopted as a theoretical framework for multiagent reinforcement learning (Littman 1994). In a Markov game, there are n agents, each facing a Markov decision process (MDP). All agents’ MDPs are correlated through their reward functions and the state transition function. As Markov decision process provides a theoretical ...
متن کاملCyclic Equilibria in Markov Games
Although variants of value iteration have been proposed for finding Nash or correlated equilibria in general-sum Markov games, these variants have not been shown to be effective in general. In this paper, we demonstrate by construction that existing variants of value iteration cannot find stationary equilibrium policies in arbitrary general-sum Markov games. Instead, we propose an alternative i...
متن کاملNonzero-sum Risk-sensitive Stochastic Games on a Countable State Space
The infinite horizon risk-sensitive discounted-cost and ergodic-cost nonzero-sum stochastic games for controlled Markov chains with countably many states are analyzed. For the discounted-cost game, we prove the existence of Nash equilibrium strategies in the class of Markov strategies under fairly general conditions. Under an additional geometric ergodicity condition and a small cost criterion,...
متن کاملA Study of Gradient Descent Schemes for General-Sum Stochastic Games
Zero-sum stochastic games are easy to solve as they can be cast as simple Markov decision processes. This is however not the case with general-sum stochastic games. A fairly general optimization problem formulation is available for general-sum stochastic games by Filar and Vrieze [2004]. However, the optimization problem there has a non-linear objective and non-linear constraints with special s...
متن کاملMultiagent reinforcement learning: algorithm converging to Nash equilibrium in general-sum discounted stochastic games
Reinforcement learning turned out a technique that allowed robots to ride a bicycle, computers to play backgammon on the level of human world masters and solve such complicated tasks of high dimensionality as elevator dispatching. Can it come to rescue in the next generation of challenging problems like playing football or bidding on virtual markets? Reinforcement learning that provides a way o...
متن کامل